Metasurface wavefront control for high-performance user-natural augmented reality waveguide glasses

Augmented reality (AR) devices, as smart glasses, enable users to see both the real world and virtual images simultaneously, contributing to an immersive experience in interactions and visualization. Recently, to reduce the size and weight of smart glasses, waveguides incorporating holographic optical elements in the form of advanced grating structures have been utilized to provide light-weight solutions instead of bulky helmet-type headsets. However current waveguide displays often have limited display resolution, efficiency and field-of-view, with complex multi-step fabrication processes of lower yield. In addition, current AR displays often have vergence-accommodation conflict in the augmented and virtual images, resulting in focusing-visual fatigue and eye strain. Here we report metasurface optical elements designed and experimentally implemented as a platform solution to overcome these limitations. Through careful dispersion control in the excited propagation and diffraction modes, we design and implement our high-resolution full-color prototype, via the combination of analytical–numerical simulations, nanofabrication and device measurements. With the metasurface control of the light propagation, our prototype device achieves a 1080-pixel resolution, a field-of-view more than 40°, an overall input–output efficiency more than 1%, and addresses the vergence-accommodation conflict through our focal-free implementation. Furthermore, our AR waveguide is achieved in a single metasurface-waveguide layer, aiding the scalability and process yield control.


Results
shows the key architecture and approach of our MOE AR display waveguide, with the AA' cross-section (Fig. 1a) and top-view (Fig. 1b) highlights in the display beam propagation. First, we note that, by incorporating careful design of each pixel for the MOEs, the RGB spectrum is guided through the eye lens to the retina, enabling a close to Maxwellian-view operation, for focal-free display to address the vergence-accommodation conflict. Second, our designed MOE display has an FoV determined only by our metasurface grating structure, achieving ≈ 55° currently or higher, even with normal-index glass. Third, as shown in Fig. 1a, the In-MOE is at a slant angle on the glass waveguide to direct the wave propagation towards the Out-MOE. Without the slant angle, the diffraction would be in both positive and negative directions (of the MOE surface normal), leading to a lower (approximately half) efficiency. Together with rigorous coupled-wave analysis to minimize undesired diffraction modes, we are able to maximize the efficiencies of the desired modes towards the output, surpassing efficiencies more than 1%. This aids the embedding of augmented information and virtual images with real-world images, especially with outdoor light, as shown in Fig. 1d example. We also note that prior HOE-DOE AR waveguides, with multi-layer multi-glass waveguides, have spurious diffraction modes and hence efficiencies sizably lower than 1%, necessitating indoor operation or shielding outdoor light by 80%. Fourth, as shown in Fig. 1a, c, our MOE waveguide utilizes only a single glass layer for the whole RGB spectrum, reducing unwanted diffraction and with higher efficiency. Our single-layer implementation also brings compactness and lightweight operation, while simplifying the MOE fabrication and yield.
We first start by defining the FoV bounds from conventional total internal reflection of a waveguide 25,108-110 , without a metasurface. As illustrated in Fig. 2a-d, for a ± 10° input, the ray-traced output would be ∓ 10°; one can obtain an increased output FoV with larger input angles. However, the waveguide total internal reflection requirement bounds the input positive angles to less than 13.3°. This limits the FoV to less than or ≈ 26.7°, for a glass waveguide of 1.46 refractive index. For one of the higher-index glasses at 1.6 refractive index as shown in Fig. 2e, the FoV can be increased to 34.9° but rapidly faces an asymptotic limit. By controlling the wave propagation through multi-period metasurfaces, Fig. 2f shows the increased of the FoV to ≈ 55° for a normal-index glass at 1.46 refractive index. This illustration is for 646 nm, supplementing the 520 nm overview shown in Fig. 1a. Through rigorous coupled-wave analysis (RCWA) [111][112][113][114] , simple modal method (SMM) [115][116][117][118][119][120] , and finite-difference time-domain (FDTD) 121 , we optimize our MOE at each position for each wavelength. With the distance from the out-MOE to the eyebox set by design and a desired FoV fixed, we fine-tune our MOE grating periods for the desired input-output angle at each pixel, for each wavelength. We note that our MOE implementation is mostly bounded by nanofabrication lithography resolution. Figure 3 next shows the subsequent MOE efficiency computation and optimization for different propagation x-positions along the metasurface y-center, mapped for a range of SiN x refractive indices, fill factors and SiN x www.nature.com/scientificreports/ film heights. Our MOE incorporates a pixel-by-pixel phase control by introducing phase changes within a length of the wavelength. The abrupt phase shifts enable freedom in controlling the wavefront, with the propagation of light being governed by Fermat's principle 63 . We observe that the efficiency is higher on the right side (positivex) of each MOE and go up to even 50% total input-output efficiencies. This can be explained in Fig. 1a, where the input-output beams on the out-MOE is shown, for our slanted input MOE design. In a slanted input and via the Littrow mounting effect, the output positive-negative diffraction lobes have unequal efficiencies, with the α P (positive diffraction lobe) having lower diffraction efficiencies than the α N (negative diffraction lobe, more than the surface normal and 90°, in the reverted beam direction). These 2D maps are for a center wavelength of 520 nm and the y-center; other wavelengths and y-positions simply have an offset in the efficiency map or with perturbed efficiencies.
In optimizing for the high-quality images for AR devices, we observe two features in these 2D map plotsthe average efficiency and the dark low-efficiency crossing lines. The efficiency fluctuations, including the dark crossing lines, can be understood when comparing the grating to the simple case of a slab waveguide, where a wave excites discrete modes in the gratings 113 . The diffraction properties and efficiency are mainly determined by the modes within the grating region. These modes propagate through the grating region with different effective indices and couple out at the grating-substrate interface 108 . The behavior of these modes results in the fluctuations and also cause the sharp dips in the efficiency maps of Fig. 3. To choose an implementation for a desired overall design (FoV, full color, and glass refractive index prototype), parameters which result in high average efficiencies and lowest number of dark crossing lines must be accounted for. Higher refractive indexes have generally higher www.nature.com/scientificreports/ average efficiency trends, but the number of dark lines almost doubles when the refractive index increases from 2.0 to 2.3. The fill factor and height shift the position of the dark crossing lines and have optimal conditions for highest average efficiencies-represented as bright islands, where the brightest pixels are located. The change in optimal height of the MOE based on the wavelength (color) and the combined average efficiency map is shown in Fig. 3c. Similar trends in the map and an increase in optimal height with wavelength can be observed. Based on these modelled mapping results, the SiN x refractive index of 2.14, a MOE height of 0.35 μm and a fill factor of 0.46 are chosen as the best conditions, avoiding most of the dark crossing lines in our display implementation. With each of the input/output MOE having a maximum simulated efficiency close to 50% and a diffraction efficiency division between the three colors (one third efficiency at each input/output), the theoretical efficiency is calculated to be close to 2.8%. Incorporating our AR waveguide design implementations, we fabricated the nanostructured metasurface optical elements in a silicon foundry process, with nitride film deposition, deep-ultraviolet lithography and MOE nanopatterning. Figure 4 summarizes our fabricated MOE wafer, with Fig. 4b illustrating the cross-section scanning electron micrograph (SEM) of our grating structures, and Fig. 4c illustrating the top-view zoom-in (SEM) of our optimized full-color MOEs. We note that the fabrication dimensions and specifications meet our designs. After each step in the fabrication, the MOE feature critical dimensions are carefully examined to confirm the design fidelity of each metasurface region to achieve the AR waveguide. Figure 5 shows several experimental setups built specifically for the AR waveguide display characterization. First, Fig. 5a-c are the distortion calibration of the In-MOE and the focal length calibration of the Out-MOE, across the red-green-blue spectrum. The distortion calibration 122,123 of the In-MOE is achieved by measuring the relative distances among nine laser spots at the Out-MOE. The small In-MOE distortion has little effect on the mono-MOE and the color-MOE. Based on the calibrated distortion, we modified the MOE and reduced the small distortions to the negligible levels by promoting the distortion calibration setup and optimizing the analysis software for the distortion calibration. The focal length calibration is achieved by tracking the nine laser spots out of the Out-MOE, which determines the focal spot astigmatism of the MOE, making it a key parameter for the design of our MOE. Figure 5d is the instrumentation for FoV [124][125][126][127] and input-output efficiency characterization. After guiding the collimated laser beam into the In-MOE, we can obtain the FoV by scanning the beam size near the focal plane of the Out-MOE. The input-output efficiency is obtained by measuring the input-output power. Via changing the input beam size and position on the In-MOE, we can obtain the whole efficiency map of the MOE, which helps to improve the uniformity of the MOE. Figure 5e shows the setup we built for the modulation transfer function (MTF) characterization based on a point source mapping 123 . A focused laser beam passes through the In-MOE, the Out-MOE and an eye-equivalent lens, with imaging on a CMOS camera.
By Fourier analysis of the measured images, Fig. 6 shows the quantified MTF of our MOE waveguide. MTF specifies how the relative contrast of different spatial frequencies is handled by the system of our MOE display 6 . Here we select a Sony laser projector (MP-CL1, update rate per image of 60 Hz, resolution of 1920 × 720, and projects 43,200 lines/s) for the compact optical engine. In this setup we remove limitations from the optical engine on the MTF caused by limited capability to balance all aberrations using off-the-shelf lenses. These aberrations include spherical aberration, chromatic aberration, astigmatism, field curvature, and distortion. The contrast sensitivity of the human eye depends on several conditions such as the luminance, the viewing angle  www.nature.com/scientificreports/ of the object, and the surrounding illumination [128][129][130] . Based on our model, we aggregate these conditions and assume the contrast sensitivity of the human eye is above 0.4 131 when the resolution of image is below 1080 pixels (full resolution). In addition, the average diameter of human retina is 24 mm 132 and its effective diameter within the FoV of our MOE is 20 mm. Therefore, we set the MOE display contrast target at 0.4, with a desired spatial frequency up to 27 lps/mm [1080pixels ÷ 2(pixels/lps) ÷ 20 mm]. Illuminated by a focused red laser, the measured MTF of the mono-MOE is depicted as a grey plot in Fig. 6a. For a contrast at 0.4, a 1064 × 1064 pixel display (≈ 29.0 lps/mm) of the red mono-MOE is experimentally observed. It is closely above our target resolution 1080 × 1080. For a contrast at 0.4 and our color-MOE, the display shows an experimental 520 × 520 pixel resolution. This is from the ≈ 13.0 lps/mm, mainly bounded by the green segments of our MOE; the red and blue segments are higher resolution at ≈ 17.0 and ≈ 15.3 lps/ mm respectively in this proof-of-principle demonstration. This green-segment resolution of the color-MOE can be improved via optimization of fabrication to increase input-output efficiency. The complex color-MOE fabrication has a lower resolution currently compared to the mono-MOE because of beam overlap between the red-green-blue segments and the resulting reduced intensity contrast. In Fig. 6b, we show the comparison of the red mono-MOE across four different positions; center, left, bottom and corner. For a contrast at 0.4, the red mono-MOE shows the highest at the center and bottom of ≈ 29.0 and ≈ 30.4 lps/mm, respectively. The left edge and corner show slight degradation of resolution with ≈ 25.5 and ≈ 23.9 lps/mm, respectively, due to less input/ output efficiency at those points. Consequently, the resolution of the edge and corner can also be increased similarly to the color-MOE. We also note that although the MTF of our system shown here is free of aberrations from lenses, the actual optical engine setup consisting of off-the-shelf lenses is currently mainly bounded by the slight mismatch between our optical engine and MOE caused by unbalanced aberrations. To overcome this limitation, later we will use customized lenses to optimize the optical engine to our MOE.

Discussion
With the MTF determined and as proof-of-principle, we build up a test measurement system consisting of the optical engine, the MOEs (either the green mono-MOE or color-MOE prototypes), an eye-equivalent lens, and a retina-equivalent white screen as shown in Fig. 7. An example input image at 1080 × 720 pixels is illustrated in Fig. 7a. For the green mono-MOE, only the green laser in our laser beam scanning projector is turned on and therefore only the green channel of the original input image (Fig. 7a) is projected through the whole system on the retina-equivalent white screen. This is depicted in Fig. 7b, for the metasurface display demonstration. We note that the boundary edges of the green image are not captured, and this is due to the cylinder lens pair size in our metasurface demonstration. For the color-MOE, all lasers (red, green and blue) in the laser beam scanning projector are turned on and therefore all RGB channels of the original input image are projected through our Optical engine and optical lens coupling interface-metasurface demonstration, with projected image at the retina-equivalent white screen location. Note that the camera photographs (b-d) and documents printouts are displayed in lower resolution than actual viewing in live-operation. The modulation transfer function (Fig. 6) is the more rigorous demonstration of the metasurface glass waveguide display performance. www.nature.com/scientificreports/ metasurface waveguide demonstration. This is illustrated in Fig. 7c, as a proof-of-principle. The image is reddish as the red segments of the color-MOE currently have higher efficiency than the green and blue segments. This can be re-balanced by optimizing the RGB efficiency of our color-MOE. To show the capability of our MOE on real world application, Fig. 8 illustrates the augmented reality image captured on a CMOS camera. The image was observed on a red mono-MOE prototype with the Sony laser projector for the optical engine. We also note that main cause of the image degradation is from the mismatch between the optical engine and MOEs, introducing spherical and chromatic aberrations, astigmatism, field curvature and distortion. This formation of the multipixel displays across the input MOE, output MOE, waveguide, optical engine, and imaging sub-system, however, paves the platform of nanostructured metasurfaces towards potential AR displays. Table 1 summarizes and compares the performance of our prototype with prototypical AR/MR waveguide displays 29,[133][134][135][136][137][138] . These prior approaches use HOE-DOE and waveguides, typically consisting of in-coupling, intermediate, and out-coupling stages. They have, however, low output efficiencies (<< 1%) since their multiple diffraction elements and waveguides generate more undesired diffraction light, while our display architecture can be achieved with one waveguide and two MOEs. The prior DOEs also typically have smaller horizontal FoVs, below 40°. The prior multi-layer grating structures are also more challenging for scaling up the fabrication. The current limitation of our single-layer two MOE implementation is a smaller eyebox, resulting in cutting of the virtual image with eye movement from the center position. However, mechanical or optical methods for eyebox increase are currently widely researched and more advanced concepts on the metasurfaces including eye-tracking 139 , increasing view-points 140 can potentially overcome this. We also note the prior implementations are single-focal or dual focal-with the user only observing the virtual object clearly when focusing their eyes to the plane. In other words, the user cannot see virtual objects at infinity if they are looking a close-by object. In contrast, our MOE architecture is by design focal-free and the image projected onto the retina, alleviating the vergence-accommodation conflict and enables the virtual object to be clearly seen whenever the user is focusing from near to infinity. Enabled by numerical design-optimization and foundry-based nanofabrication, we demonstrate experimental proof-of-principle operation of metasurface optical elements towards AR/MR waveguide www.nature.com/scientificreports/ displays. Our metasurfaces has enabled FoV greater than 40°, input-output efficiencies greater than 1% and is based on a focal-free implementation, in support of augmented display technologies.

Methods
Device fabrication. First a 350-nm Si-rich SiN x layer is deposited on 500 μm thick fused silica wafers using low-pressure chemical vapor deposition (LPCVD, Tystar Titan II) with a gas mixture of SiH 2 Cl 2 and NH 3 . The resulting silicon nitride layer was patterned via lithography at Broadcom Inc, by an optimized ASML PAS5500-1150 scanner capable of 90 nm resolution with 280 nm of positive resist and 40 nm of bottom anti-reflective coating or at UCSB Nanofab with a 248 nm DUV ASML 5500 stepper using a positive resist of UV210 and top anti-reflective coating DUV42P-6 with thicknesses of 230 nm and 60 nm, respectively. Subsequently the SiN x layer is etched down at UCLA Nanolab using dry reactive ion etching via ULVAC NLD 570 fluorine etching machine using a photoresist etch mask. The etch parameters were 38 sccm Argon, 4 sccm oxygen, and 38 sccm CHF 3 , 700 W ICP power, and 100 W RIE power at a pressure of 3 mTorr. The Si 3 N 4 etch was able to achieve a 3:1 aspect ratio with 150 nm feature sizes and a side wall angel of 86 degrees with a mask selectivity of 1.8:1. and diced. Each input-output MOE is then bonded to the glass waveguide for testing. Separate input-output MOEs are examined via SEM (Hitachi S4700) for sidewall and dimensional characterization.